.. Kenneth Lee 版权所有 2019-2020 :Authors: Kenneth Lee :Version: 1.0 WarpDrive as a General Heterogeneous Platform ********************************************** More and more heterogeneous systems are introduced into the industries these days. Because CPU is indeed a good way to control rather than compute. OpenCL is considered a general way to access the heterogeneous system (aka. device in this document. it can be TPU, NPU, CPU, GPU, or any accelerator). But many developers still prefer native interface, such as CUDA. I think one of the reasons is that OpenCL standardizes the device interface but not simplify anything. It does not even hide the difference of the device configuration in revision deviation. So there is not much benefit to adopted it while native interface provides faster update and specific optimization. We introduce WarpDrive as a general platform for heterogeneous system from another perspective. It standardizes only the memory model and solve the problem most of the solutions can not ignore: The device must be used by application in use space. If you have a bunch of data in the user space, and you need your device program to take care of it, you will need to share the user address space with the device. IOMMU is the only way for the protection without heavy kernel interaction. WarpDrive manages IOMMU for you to share the address space between the application and the device. If the application open an WarpDrive-enabled device, it is bound to the device and share its addresses space to the device. You can have your device program (and data) ready in the memory and add a request to the device fd (via ioctl or direct mmio doorbell). So The device can get it accordingly. The program can refer to the same addresses as the user application (in CPU side) does. With proper setup, you can access the hardware without any syscall in the data path. WarpDrive does not care how the program is generated. And it does not care the device memory. The device memory is the problem of the device. it should be left to the device itself. It is the device program's problem if it wants to copy the data from the main memory. The CPU just give its input in the main memory and it have to take it back from the the main memory too. How the device make use of its own buffer/memory should be scheduled by the device or device program compiler according to the device's configuration. The concept of WarpDrive is simple but it guild the evolution of IOMMU subsystem, such as: Multiple process support (via ASID/PASID)Page fault from device (SVM or SVA)Two stages page tables support (for use it in a VM) WarpDrive can also be used as a support technology of other solution such as OpenCL or CUDA. WarpDrive is still in RFC stage for mainline Linux Kernel. Please join us to make it better. The mailing list for the kernel topic: https://lists.ozlabs.org/listinfo/linux-accelerators The kernel branch with Hisilicon Hi1620 accelerator and two dummy test drivers: https://github.com/Kenneth-Lee/linux-kernel-warpdrive The user space framework: https://github.com/Kenneth-Lee/warpdrive And here is a qemu branch with dummy warpdrive device for you to test the feature without real hardware: Kenneth-Lee/qemu-warpdrive For more information, please see the document in: https://github.com/Kenneth-Lee/linux-kernel-warpdrive/blob/master/Documentation/warpdrive/warpdrive.rst